Reward inference of discrete-time expert's controllers: A complementary learning approach

نویسندگان

چکیده

Uncovering the reward function of optimal controllers is crucial to determine desired performance that an expert wants inject a certain dynamical system. In this paper, inference algorithm discrete-time expert's proposed. The approach inspired by complementary mechanisms striatum, neocortex, and hippocampus for decision making experience transference. These systems work together infer associated controller using merits data-driven online learning methods. proposed models neocortex system as two independent algorithms given Q-learning gradient identification rule. modelled least-squares update rule extracts relation from states control inputs data. striatum inverse which iteratively finds hidden function. Lyapunov stability theory used show convergence approach. Simulation studies are demonstrate effectiveness algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discrete-time Fractional-order Controllers

The theory of fractional calculus goes back to the beginning of the theory of di erential calculus but its inherent complexity postponed the application of the associated concepts. In the last decade the progress in the areas of chaos and fractals revealed subtle relationships with the fractional calculus leading to an increasing interest in the development of the new paradigm. In the area of a...

متن کامل

Dopamine, reward learning, and active inference

Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We ...

متن کامل

Task Completion Transfer Learning for Reward Inference

Reinforcement learning-based spoken dialogue systems aim to compute an optimal strategy for dialogue management from interactions with users. They compare their different management strategies on the basis of a numerical reward function. Reward inference consists of learning a reward function from dialogues scored by users. A major issue for reward inference algorithms is that important paramet...

متن کامل

Continuous-Time and Discrete Multivariable Decoupling Controllers

The paper is focused on a design and implementation of a decoupling multivariable controller. The controller was designed in both discrete and continuous-time versions. The control algorithm is based on polynomial theory and pole – placement. A decoupling compensator is used to suppress interactions between control loops. The controller integrates an on – line identification of an ARX model of ...

متن کامل

Branching time controllers for discrete event systems

We study the problem of synthesizing controllers for discrete event systems in a branching time framework. We use a class of labelled transition systems to model both plants and speci,cations. We use ,rst simulations and later bisimulations to capture the role of a controller; the controlled behaviour of the plant should be related via a simulation (bisimulation) to the speci,cation. For both s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Sciences

سال: 2023

ISSN: ['0020-0255', '1872-6291']

DOI: https://doi.org/10.1016/j.ins.2023.02.079